GLMM_POISSON

Overview

The GLMM_POISSON function fits a Generalized Linear Mixed Model (GLMM) with a Poisson distribution for analyzing clustered count data. GLMMs extend standard generalized linear models by incorporating random effects, allowing the model to account for correlation structures within groups or clusters—such as repeated measurements on the same subjects, students nested within schools, or events recorded at multiple sites.

This implementation uses the PoissonBayesMixedGLM class from the statsmodels library. The model employs variational Bayes (VB) estimation, a computational method that approximates the posterior distribution of model parameters. Unlike traditional maximum likelihood approaches, Bayesian estimation provides both point estimates (posterior means) and measures of uncertainty (posterior standard deviations). For a detailed review of variational inference methods, see Blei, Kucukelbir, and McAuliffe (2017).

The Poisson GLMM models count outcomes y_i using a log link function:

\log(\mu_{ij}) = \mathbf{x}_{ij}^\top \boldsymbol{\beta} + \mathbf{z}_{ij}^\top \mathbf{u}_j

where \mu_{ij} is the expected count for observation i in group j, \boldsymbol{\beta} represents fixed effects coefficients, and \mathbf{u}_j represents random effects for group j. The random effects are assumed to follow independent Gaussian distributions with mean zero and variance components estimated from the data.

The function returns fixed effects coefficients along with incidence rate ratios (IRRs), which represent the multiplicative change in the expected count for a one-unit increase in the corresponding predictor. The model also provides variance component estimates that quantify between-group variability. For more information on GLMMs, see the statsmodels GLMM documentation and the UCLA introduction to generalized linear mixed models.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLMM_POISSON(y, x, groups, x_random, fit_intercept, alpha)
  • y (list[list], required): Count dependent variable as column vector (non-negative integers)
  • x (list[list], required): Fixed effects predictors matrix where each column is a predictor
  • groups (list[list], required): Group/cluster membership as column vector (integer coded)
  • x_random (list[list], optional, default: null): Random effects predictors matrix. If omitted, uses random intercept only
  • fit_intercept (bool, optional, default: true): If true, includes a fixed intercept in the model
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with GLMM results, or error message string.

Examples

Example 1: Poisson GLMM with random intercept only

Inputs:

y x groups
2 1 1
3 2 1
5 3 1
1 1 2
2 2 2
4 3 2

Excel formula:

=GLMM_POISSON({2;3;5;1;2;4}, {1;2;3;1;2;3}, {1;1;1;2;2;2})

Expected output:

"non-error"

Example 2: Poisson GLMM with random slopes

Inputs:

y x groups x_random
5 1 1 1
7 2 1 2
10 3 1 3
3 1 2 1
5 2 2 2
8 3 2 3

Excel formula:

=GLMM_POISSON({5;7;10;3;5;8}, {1;2;3;1;2;3}, {1;1;1;2;2;2}, {1;2;3;1;2;3})

Expected output:

"non-error"

Example 3: Poisson GLMM with 90% confidence intervals

Inputs:

y x groups alpha
10 1 1 0.1
12 1.5 1
15 2 1
8 1 2
10 1.5 2
13 2 2

Excel formula:

=GLMM_POISSON({10;12;15;8;10;13}, {1;1.5;2;1;1.5;2}, {1;1;1;2;2;2}, 0.1)

Expected output:

"non-error"

Example 4: Poisson GLMM without fixed intercept

Inputs:

y x groups fit_intercept
3 1 1 false
6 2 1
9 3 1
2 1 2
4 2 2
6 3 2

Excel formula:

=GLMM_POISSON({3;6;9;2;4;6}, {1;2;3;1;2;3}, {1;1;1;2;2;2}, FALSE)

Expected output:

"non-error"

Python Code

import math
import numpy as np
import pandas as pd
import warnings
from scipy import stats as scipy_stats
from statsmodels.genmod.bayes_mixed_glm import PoissonBayesMixedGLM

def glmm_poisson(y, x, groups, x_random=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Mixed Model (GLMM) with Poisson family for count clustered data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.bayes_mixed_glm.PoissonBayesMixedGLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Count dependent variable as column vector (non-negative integers)
        x (list[list]): Fixed effects predictors matrix where each column is a predictor
        groups (list[list]): Group/cluster membership as column vector (integer coded)
        x_random (list[list], optional): Random effects predictors matrix. If omitted, uses random intercept only Default is None.
        fit_intercept (bool, optional): If true, includes a fixed intercept in the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with GLMM results, or error message string.
    """
    # Set seed for reproducibility (Variational Bayes uses random initialization)
    np.random.seed(42)

    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_2d_numeric(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if not arr:
            return f"Invalid input: {name} cannot be empty."
        for i, row in enumerate(arr):
            if not isinstance(row, list):
                return f"Invalid input: {name} row {i} must be a list."
            if not row:
                return f"Invalid input: {name} row {i} cannot be empty."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Invalid input: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Invalid input: {name}[{i}][{j}] must be finite."
        return None

    # Normalize inputs to 2D lists
    y = to2d(y)
    x = to2d(x)
    groups = to2d(groups)
    if x_random is not None:
        x_random = to2d(x_random)

    # Validate inputs
    err = validate_2d_numeric(y, "y")
    if err:
        return err
    err = validate_2d_numeric(x, "x")
    if err:
        return err
    err = validate_2d_numeric(groups, "groups")
    if err:
        return err
    if x_random is not None:
        err = validate_2d_numeric(x_random, "x_random")
        if err:
            return err

    # Check dimensions
    n_obs = len(y)
    if len(x) != n_obs:
        return "Invalid input: y and x must have the same number of rows."
    if len(groups) != n_obs:
        return "Invalid input: y and groups must have the same number of rows."
    if x_random is not None and len(x_random) != n_obs:
        return "Invalid input: y and x_random must have the same number of rows."

    # Check column consistency
    n_y_cols = len(y[0])
    for i, row in enumerate(y):
        if len(row) != n_y_cols:
            return f"Invalid input: y row {i} has {len(row)} columns, expected {n_y_cols}."
    if n_y_cols != 1:
        return "Invalid input: y must be a column vector (single column)."

    n_x_cols = len(x[0])
    for i, row in enumerate(x):
        if len(row) != n_x_cols:
            return f"Invalid input: x row {i} has {len(row)} columns, expected {n_x_cols}."

    n_groups_cols = len(groups[0])
    for i, row in enumerate(groups):
        if len(row) != n_groups_cols:
            return f"Invalid input: groups row {i} has {len(row)} columns, expected {n_groups_cols}."
    if n_groups_cols != 1:
        return "Invalid input: groups must be a column vector (single column)."

    if x_random is not None:
        n_xr_cols = len(x_random[0])
        for i, row in enumerate(x_random):
            if len(row) != n_xr_cols:
                return f"Invalid input: x_random row {i} has {len(row)} columns, expected {n_xr_cols}."

    # Validate y contains non-negative values (count data)
    for i, row in enumerate(y):
        if row[0] < 0:
            return f"Invalid input: y[{i}] must be non-negative (count data)."

    # Validate scalar parameters
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."
    if not isinstance(alpha, (int, float)):
        return "Invalid input: alpha must be numeric."
    if math.isnan(alpha) or math.isinf(alpha):
        return "Invalid input: alpha must be finite."
    if alpha <= 0 or alpha >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Extract flat arrays
    y_flat = [row[0] for row in y]
    groups_flat = [row[0] for row in groups]

    # Build DataFrame for fixed effects
    x_list = [[row[i] for row in x] for i in range(len(x[0]))]
    x_dict = {i: x_list[i] for i in range(len(x_list))}
    x_df = pd.DataFrame(x_dict)

    # Add intercept if requested
    if fit_intercept:
        x_df.insert(0, 'const', 1.0)

    # Build random effects structure for variance components
    # For PoissonBayesMixedGLM, exog_vc should be 2D array (n_obs x n_variance_components)
    # ident should be 1D array (n_variance_components,) with identifiers
    if x_random is None:
        # Random intercept only - single variance component
        exog_vc = np.ones((n_obs, 1))
        ident = np.array([0])
    else:
        # Random intercept + slopes
        xr_list = [[row[i] for row in x_random] for i in range(len(x_random[0]))]
        n_random = len(xr_list) + 1  # +1 for intercept
        exog_vc = np.ones((n_obs, n_random))
        # Fill in random slopes
        for i in range(len(xr_list)):
            exog_vc[:, i + 1] = xr_list[i]
        ident = np.arange(n_random)

    # Fit the model
    try:
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore')
            model = PoissonBayesMixedGLM(
                endog=y_flat,
                exog=x_df,
                exog_vc=exog_vc,
                ident=ident
            )
            result = model.fit_vb()
    except Exception as e:
        return f"statsmodels.genmod.bayes_mixed_glm.PoissonBayesMixedGLM error: {e}"

    # Extract fixed effects results (Bayesian posterior means and SDs)
    fe_params = result.fe_mean
    fe_sd = result.fe_sd

    # Compute z-statistics and p-values from posterior
    tvalues = fe_params / fe_sd
    pvalues = 2 * (1 - scipy_stats.norm.cdf(np.abs(tvalues)))

    # Compute confidence intervals
    z_crit = scipy_stats.norm.ppf(1 - alpha / 2)
    conf_int = np.column_stack([
        fe_params - z_crit * fe_sd,
        fe_params + z_crit * fe_sd
    ])

    # Validate outputs don't contain NaN or inf
    for val in [fe_params, fe_sd, tvalues, pvalues]:
        arr = np.array(val)
        if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
            return "statsmodels.genmod.bayes_mixed_glm.PoissonBayesMixedGLM error: Model produced non-finite values."

    output = []

    # Section 1: Fixed effects
    output.append(['fixed_effects', 'parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'incidence_rate_ratio'])

    # Add fixed effect coefficients
    for i in range(len(fe_params)):
        if fit_intercept and i == 0:
            param_name = 'Intercept'
        else:
            idx = i if not fit_intercept else i - 1
            param_name = str(idx)

        irr = math.exp(float(fe_params[i]))

        output.append([
            'fixed_effects',
            param_name,
            float(fe_params[i]),
            float(fe_sd[i]),
            float(tvalues[i]),
            float(pvalues[i]),
            float(conf_int[i, 0]),
            float(conf_int[i, 1]),
            irr
        ])

    # Section 2: Random effects variance components (padded to 9 columns)
    output.append(['random_effects', 'parameter', 'variance', 'std_dev', None, None, None, None, None])

    # Extract random effects variance
    # vcp_mean contains log standard deviations, so we need to exponentiate and square for variance
    if hasattr(result, 'vcp_mean'):
        vcp_mean = result.vcp_mean
        # Check for NaN/inf in random effects
        if np.any(np.isnan(vcp_mean)) or np.any(np.isinf(vcp_mean)):
            return "statsmodels.genmod.bayes_mixed_glm.PoissonBayesMixedGLM error: Random effects contain non-finite values."

        # vcp_mean is log(SD), so SD = exp(vcp_mean), Var = SD^2 = exp(2*vcp_mean)
        for i in range(len(vcp_mean)):
            log_sd = float(vcp_mean[i])
            std_dev = math.exp(log_sd)
            var = std_dev ** 2
            if i == 0 and x_random is None:
                param_name = 'Group Var'
            else:
                param_name = f'Random[{i}]'
            output.append(['', param_name, var, std_dev, None, None, None, None, None])
    else:
        # No random effects variance available
        output.append(['', 'Group Var', 0.0, 0.0, None, None, None, None, None])

    # Model statistics (padded to 9 columns)
    # Bayesian models don't have traditional likelihood-based statistics
    # Set to 0 as placeholders
    output.append(['log_likelihood', 0.0, None, None, None, None, None, None, None])
    output.append(['aic', 0.0, None, None, None, None, None, None, None])
    output.append(['bic', 0.0, None, None, None, None, None, None, None])

    return output

Online Calculator